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Abstract. This note gives a preliminary account of the transcoding or 
rechanneling problem between different stimuli as it is of interest for the 
natural interaction or affective computing fields. By the consideration of 
a simple example, namely the color response of an affective lamp to a 
sensed facial expression, we frame the problem within an information- 
theoretic perspective. A full justification in terms of the Information 
Bottleneck principle promotes a latent affective space, hitherto surmised 
as an appealing and intuitive solution, as a suitable mediator between 
the different stimuli. 
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1 Introduction 

At the heart of non-verbal interaction between agents, either artificial or bio¬ 
logical, is a rechannelling ability, namely the ability of gathering data from one 
kind of signal and instantaneously turn it into a different kind of signal. In artifi¬ 
cial agents, such rechannelling or transcoding ability must be simulated through 
some form of “computational synaesthesia”. Strictly speaking, synaesthesia is a 
neurological phenomenon in which stimulation of one sensory or cognitive path¬ 
way leads to automatic, involuntary experiences in a second sensory or cognitive 
pathway [20]. Here, more liberally, we adopt it as a good metaphor for such 
rechannelling/transcoding of information [22] . 

In this note, as a case study, we consider the problem of transducing a sensed 
facial expression into a color stimuli. Denote V and C the random variables 
(RVs) standing for a visible expression display and for an emitted color stimulus, 
respectively. Then, the transcoding V 4 C can be described in probabilistic 
terms as that of sampling a specific color stimulus c, when expression stimulus 
v is observed, namely 


c ~ P(C | V = v), 


(1) 


where P (C | V) is the conditional probability density function (pdf) defining 
the probability of generating a color stimulus c conditioned on the observation 
of expression v. Such kind of problem is of interest for many applications in 
social signal processing [5S], natural interaction [25], social robotics P3j. But, 
most important, here we discuss how a principled solution involves deep issues 
in spite of the apparent specificity of the problem. 

An appealing way to conceive transcoding is through the mediation of some 
kind of latent space in particular a space of affective or emotional experience, 
which confers a unified semantics to the different kinds of non-verbal signals. It 
has been argued that this could be necessary for grounding synaesthetic cross- 
modal correspondences simulation-based theory of emotion and empathy 

|29| . Also, the mediation of a continuous dimensional space has been advocated 
for analyzing many different expressive modalities and to the purpose of building 
affective objects [55]. To such aim, we focus on the Pleasure/Arousal/Dominance 
space (PAD, [51] ) as a continuous latent space to support “synesthesia” of facial 
expressions into color. 

In this study we discuss how such solution can be conceived and grounded 
in an information-theoretic perspective, namely the Information Bottleneck (IB) 
framework introduced in 26] (cfr. Section [2]). 


Affective Object 



Fig. 1: The Mood Lamp: an affective “synaesthetic” object that responds to 
user’s facial expressions by changing the color of the light emitted. 


As a proof of concept we present the Mood Lamp (cfr. Fig. [[]). The Mood 
Lamp is a kind of affective object , that is a “physical object which has the ability 
to sense emotional data from a person, map that information to an abstract form 
of expression and communicate that information expressively, either back to the 
subject herself or to another person” [22] • In particular, here a facial expression 
is used to convey affect states to an Ikea RGB color lamp, which will respond 
by changing the color of the light emitted in accordance with the affect. 

Modeling computational synaesthesia as specified through Eq. [T] in the IB 
perspective has the advantage of providing a principled approach, characterized 
by a minimum of assumptions (Section [3]). However there are a number of subtle 
difficulties to overcome that deserve being discussed (cfr. Sections [4] and [5]). 








2 Background and rationales 


Central to this work is the idea that the synaesthetic transduction V n- C can 
be performed by resorting to an affect space, say E, as a mediating factor. 

Resorting to affect for transcoding stimuli may seem prima facie an instru¬ 
mental approach; however, two issues bear on this choice. First, the insight of an 
affect space as a common factor for rechanneling between kinds of information 
is a not new in the psychological literature. On the one hand, perception and 
emotion are closely linked IS]. For instance, as to the specific case of synaes¬ 
thetic cross-modal correspondences, affective similarity [ 23 j has been suggested 
as a contributing factor: stimuli may be matched if they both happen to increase 
an observer’s level of alertness or arousal, or if they both happen to have the 
same effect on an observer’s emotional state, mood, or affective state. Efficient 
handling of affective synesthesia has been discussed by Collier who has shown 
[6] that both perceptual stimuli such as colours, shapes, and musical fragments 
- and human emotions can be represented in a simple multidimensional space 
with two or three corresponding dimensions. Clearly this idea is consistent with 
the framework of an underpinning continuous “affect space”, which can be ap¬ 
proximated by either two primary dimensions, e.g. valence and activity (arousal) 
El, or three such as pleasure, arousal, and dominance (PAD) as proposed by 
Russell and Mehrabian m • 



min l(V;E) max l(E;C) 



(a) (a) The IB framework 


(b) (b) A PGM representation of IB 


Fig. 2: Synesthesia of facial expression V into the (lamp) color C as an Informa¬ 
tion Bottleneck problem. The displayed expression is represented as a random 
vector V, computed on the basis of facial landmarks L (displayed as red dots 
superimposed on the face), (a) Transcoding V i —> C is modelled as the search 
for a compressed representation of V, namely the affect space E, which achieves 
minimum redundancy while maintaining the mutual information J(E; C) about 
the relevant variable C, as high as possible, (b) The left graph Gin encodes 
the compression process; the right graph Gout is the target model representing 
which relations should be maintained or predicted. The IB principle boils down 
to minimize the information maintained by Gin and to maximize the information 
preserved by Gout- 













The second issue, which is tied in a subtle way to the previous one, grounds 
in the general and fundamental principle that an organism who maximizes the 
adaptive value of its actions given fixed resources should have internal representa¬ 
tions of the outside world that are optimal in a very specific information-theoretic 
sense [2j. In a communicative action, this optimization problem is related to joint 
source channel coding, namely the task of encoding and transmitting information 
simultaneously in an efficient manner ?]. 

One route to do justice to both issues is the Information Bottleneck (IB), 
[26] . IB is an information-theoretic principle for coping with the extraction of 
relevant components of an “input” random variable X, with respect to an “out¬ 
put” random variable Y. This is performed by finding a bottleneck variable, that 
is a compressed, non-parametric and model-independent representation T of X, 
that is most informative about Y. 

In our case the intuition is that the bottleneck variable E is suitable to 
capture the relevant affective aspects of the facial expression stimuli V that are 
informative about the output color stimulus C (cfr. Fig. l2al) . 

Denote /(X; Y) the mutual information [7.. The original IB approach deter¬ 
mines the auxiliary latent space E and related mapping V n- E, such that the 
mutual information I (V; E) is minimized (to achieve maximum compression), 
while relevant information /(E; C) is maximized. Hence 

min/(V;E)-/?I(E;C), (2) 

W4-E 

where V i —> E is the rule for creating the internal representation, and the positive 
parameter (3 smoothly controls the tradeoff between compression and preserved 
relevant information. 

The optimization principle in Eq. [2] is very abstract; also, no analytical so¬ 
lution is available. However, it has been shown by Friedman et al. m that 
the IB problem can be suitably reformulated in terms of directed Probabilistic 
Graphical Model (PGM, (T6j) representation (cfr. Fig.l2bl). A directed PGM is a 
graph-based representation where nodes denote RVs and arrows/arcs code con¬ 
ditional dependencies between RVs. Stated technically, the Q structure encodes 
the set of conditional independence assumptions over the set of RVs {X.;} (called 
the local independencies, m) involved by the joint pdf P({Xj}) associated to 
Q. Then, the joint pdf factorizes according to Q m that is P is consistent with 
Q, P \= Q. Given a PGM Q, I 0 = 7(X,; Paf ) denotes the information com¬ 

puted with respect to the pdf P \= Q [ID], where Paf stands for the ensemble 
of parents of node X, : . 

Under these circumstances, the IB principle (Eq. [2]) can be shaped in the 
language of PGMs by considering two directed graphs Gin and Gout , together 
with the pdfs entailing such graphs, Q |= Gin and P |= Gout , respectively (cfr. 
Fig. I2bl) . Thus, the information that we would like to minimize is now given by 
I 0IN , where I 01N = J(V,E). The relevant information that we wish to preserve 
is specified by the target model Gout , as I Gout = /(E; V) + /(E; C). Assuming 
this, Eq. [2] can be rewritten, 

min I 0IN -pl^our _ m in/(V; E) + "/D KL (QCV, E, C)||P(V, E, C)) (3) 

Vi->E Q 


where -Dax(< 3 (X)||P(X)) is the Kullback-Leibler divergence between distribu¬ 
tions Q and P [ 7 ]. The scale parameter 7 balances the above two factors and 
is related to /3 as j3 = 7/(1 + 7). In the limit 7^0we are only interested in 
compressing the variable V. When 7 —> 00 we concentrate on choosing a pdf Q 
that is close to the distribution P (= Gout- 

P(V, C, E) = P(V I E)P(C I E)P(E), (4) 

by minimizing D KL (Q(V, E, C)||P(V, E, C)). 

It has been shown that iterative approximate solutions to Eq. [2j which cycle 
between determining Q(E) and Q( C | E) for a fixed Q(E | V), and computing 
Q(E | V) for hxed Q(E) and Q{ C | E), are a formulation of the generalized 
Expectation-Maximization algorithm for clustering [23]. Clearly, this holds when 
the latent space E is a discrete space. Indeed, it is readily seen that at the 
extreme spectrum 7 —> 00 , the minimization in Eq. [3] boils down to minimize 
Dkl(Q(V i E, C)||P(V, E, C)) which is but one instance of the Variational Bayes 
method for learning the generative model P (= Gout , as represented in the target 
model of Fig. I2bl 

When the transcoding operation relies upon a continuous latent space - as in 
our case - the IB approach represented in terms of P |= Gout is reminiscent of 
several latent factor models for paired data, such as Bayesian factor regression, 
Probabilistic Partial Least squares and Probabilistic CCA [ 161 . 

3 Methods 

The IB approach provides a principled justification to the use of a mediating 
latent space for simulating computational synaestesia. After the learning stage, 
when the distribution factors of the target joint pdf are available, transcoding 
in Eq. [T]can be performed via the latent space E: 

e ~ P(E | V = v), c~P(C|E = e). (5) 

It is worth remarking that learning procedures implementing optimization © 
or © have the goal of designing from scratch a latent space that is optimal 
with respect to the given constraints and the joint distribution, here P(V, C). 
In the case study we are considering, conditions are slightly different. First, the 
latent space E is not constructed abstractly, but it should be chosen guided by 
psychological theories of emotion; this somehow simplifies some machine learning 
issues, for instance, the dimensionality of the space is not to be learned. Second, 
the joint pdf is not straightforwardly available. 

As to the first issue, we assume a core affect representation. Core affect is a 
neurophysiological state that underlies simply feeling good or bad, drowsy or en¬ 
ergised, and it can be experienced as free-floating, or mood, or can be attributed 
to some cause (and thereby begin an emotional episode) jJD]. Thus, it is a con¬ 
tinuous latent space and a suitable representation is provided by the PAD space 
proposed by Mehrabian and Russell [2Tj. Such space can be described along 


three nearly independent continuous dimensions: Pleasure-Displeasure (mea¬ 
sured by P), Arousal-Nonarousal ( A ), and Dominance-Submissiveness (D); thus, 
E = [ PAD ] T . 

Note that, under the assumption of an actual affective state E = e, it is 
easy to show, by using Bayes’ rule and the joint pdf factorisation d4j, that 
P(V,C | E) = P(V | E)P(C | E), thus V JL C | E. That is, if the affective 
state is given, then V and C are conditionally independent. The very issue here 
is thus obtaining the “mapping” probabilities P(E | V) and P(C | E). To this 
end, we can make the simplifying assumption of a Gaussian IB j5]. In this case 
an optimal compression E is obtained with a noisy linear transformation of V: 

e = W£V + ( B , ~ A7(0, A'jb), (6 ) 

where is an additive noise term sampled from a zero-mean Gaussian pdf 

A/-(o,r ?E ). 

Similarly, the most natural choice for color is a continuous space; e.g., in 
studies concerning relationships between color and emotion the HSL space - 
defined on Hue (H), Saturation ( S ) and Luminance (L) - has been used [11114] . 
Thus, a generative model for mapping P(V | E) is 

c = W c e + ( c , £ c ~A/-( 0,I; ?c ). (7) 

Eqs.[G] and [7] nicely simplify the synaesthetic mapping to a pair of regressions 
on a joint latent space, however the second issue related to the actual availability 
of P(V, C) must be taken into account. Needless to say, the use of a continuous 
affect space brings along a number of challenges. In the psychological literature, 
fleeting changes in the countenance of a face are considered to be “expressions 
of emotion” (EEs) and have been systematically investigated by Ekrnan [9] in a 
categorical perpsective. Ekman’s work has fostered a vaste amount of theoretical 
and empirical work, which has been particularly influent in the affective com¬ 
puting community (28j . Under these circumstances, finding the map V 4 E, 
has been mostly relied on a pattern recognition approach to infer emotions from 
expressions under the fundamental assumption of basic emotions, for example 
by considering the discrete set E = (joy, sadness, anger, disgust, surprise, fear}. 
By contrast, Eq. [ 6 ] assumes a probabilistic relationship between E and V where 
E is continuously defined. 

A second problem to solve is related to Eq. 0 that is to learn the mapping 
E M- C. In the past decades, only a few researchers investigated the relationship 
between color and emotion [2711513114125] (and often in the sense of emotion 
elicited by a colors and not the vice versa). In this case, the main problem is 
setting up a minimal training set which we derive from data available from the 
psychological literature. These issues are addressed in the following sections. 

4 From face expression to mood 

In this section we detail how we solve the problem of learning a probabilistic 
relationship between E and V where E is continuously defined according to the 






PAD model. To this end, we exploit results of experimental studies that have 
evaluated the PAD value of discrete emotion states, e.g., [121 . 

A very first step concerns with the facial landmark localisation, which can be 
summarised as follows. Denote L = {l 1 , l 2 , ■ • ■ , 1”} the locations of n landmark¬ 
ing parts of the face, and F = {f 1 , f 2 , ■ ■ • , f”} the measured detector responses, 
where f ' = (j>(P.l) is the response or feature vector provided by a local detector 
at location P in image or frame X. Then, localisation can be solved by finding 
the value of L that maximises the probability of L given the responses from local 
detectors, namely L* = argmaxL P(L|F). Following [ 8 ], we exploit a part-based 
framework that integrates an effective local representation based on sparse cod¬ 
ing. Sparse coding has recently gained currency in face analysis (e.g.,H3lU). In 
particular: 

m p n 

L* = argmax^ J[ P(AP M )P(P| fi )dt, (8) 

where the prior P(AV k t ) accounts for the shape or global component of the 
model, and P(P|f*)) for the appearance or local component. For what concerns 
the local component P(P|f i ), we resort to Histograms of Sparse Codes to sample 
patch responses P. which we learn from facial images (see [ 8 ] for details). 

For each image/frame we consider 40 landmarks L = [l 1 • • -1 40 ] T as shown 
in Fig. [3j and we map them into a vector of visible expression parameters V by 
measuring the landmark displacements. This step, in a vein similar to Action 
Units approaches [5], is aimed at capturing the expression movements within lo¬ 
cal face region, such as mouth-bent, eye-open and eyebrow-raise, etc., as detailed 

in Tab. Q] ,31- 



Name 

EP 

Definition 

Eyes height 

v u 

(*« - *;) * 2 

Eyes / brows space 

v 1 

jo pa . (C-’U) 
y L y ' 4 

Eyebrow’s inner height 

v 4 

l 1 * + v 1 - I u 
y ~ y 

Eyebrow’s outer height 

v J 

P a + V 1 - P 

y ^ y 

Mouth width 

V 4 


Mouth openness 

v b 

lcil i 2t> 

Mouth twist 

v 6 

l y ~ l y 122 

2 l y 


Fig. 3: The 40 facial landmarks Table 1: Visual expression parameters 

(EP) via local landmark displacements 


The extracted expression parameters V £ R 7 are put in correspondence to 
PAD values, E £ R 3 , by using Eq. [G] In the current simulation, a multilinear 
ridge regression has been used, that is a penalized least squares method that 
adds a Gaussian prior to the parameters to encourage them to be small. Such 
model has interesting connection to latent variable space inference [16] . 





















5 The color of mood 


Here, we discuss some subtleties related to the mapping E C in order to learn 
the generative model of Eq. 0 Recall that we represent color as a random vector 
in HSL color space, i.e. c = [HSL] J . 

The seminal work investigating the relationship between color and emotion is 
that by Valdez and Mehrabian m- They mainly studied how saturation S and 
luminance L affect PAD. In [15I3I25J the emotions elicited by basic colors have 
been qualitative presented. Only recently in m a synthesis of these approaches 
has been proposed, aiming at allowing robots to express the intensity of emotions 
by coloring and blinking LED placed around their eyes. This work has the limit 
to resort to only two distinct values for both S and L , and four values for the 
hue H, hence leading different emotions to be represented by the same color. 

In our study, we propose a finer correspondence model, preserving maximum 
representativeness of the three components HSL. More precisely, as to S and L, 
following jl4j , we invert the dependency of PAD values proposed in B 3 , while 
maintaining the obtained results. Define 

P = 0.69L + 0.22S, A = —0.31L + 0.60S', D = -0.76L + 0.325. (9) 

Then, S and L can be derived 


c = (W T W) _1 W T e, 


( 10 ) 


where c = [LS 1 ] T , W 


0.69 0.22 
-0.31 0.60 
-0.76 0.32 


, and e = [PAD] T . 


Eq[lU] provides a partial color mapping E >->• (S, L). To complete the picture 
we need to take into account hue values H. Unfortunately, the hue / PAD rela¬ 
tion proposed in |27] cannot be inverted. We thus derive this component from 
Plutcliik’s psycho-evolutionary emotion theory (19l . In his work, each emotion is 
associated to a given hue value, while saturation and luminance vary according 
to the emotion intensity (Fig. 0). As we need an association between PAD and 
hue values, we rely on the classification made by Mehrabian m, adopting the 
PAD values of a subset of corresponding affective states, as tabulated in Tab.0 

Eventually, PAD values and the corresponding HSL values, can serve, respec¬ 
tively, as feature and target sets for learning the multivariate linear regression 
model given in Eq. 0 As in the case of Eq. [6] this is accomplished via ridge 
regression. 

Finally, the obtained HSL values are converted into RGB space. The latter 
step has a practical motivation. As discussed from the beginning, the realisation 
of the transcoding process has been experimented through the Mood Lamp, an 
affective object conceived as i) a sensing interface, namely a low-cost web camera 
/ notebook communicating via USB with ii) a modified Ikea lamp, equipped with 
an Arduino UNO board to control an RGB LED (see Fig. 0. 







Fig. 4: Plutchik’s wheel. Relationships 
between emotions and colors. Hue is 
associated to a specific emotion, while 
saturation and luminance determine its 
intensity. 


Emotion 

H 

S 

L 

P 

A 

D 

joy 

60 

67 

100 

0.81 

0.51 

0.46 

ecstasy 

60 

67 

100 

0.62 

0.75 

0.38 

fear 

120 

100 

59 

-0.64 

0.60 

-0.43 

terror 

120 

100 

50 

-0.62 

0.82 

-0.43 

amazement 

203 

100 

88 

0.16 

0.88 

-0.15 

sadness 

240 

68 

100 

-0.63 

-0.27 

-0.33 

boredom 

300 

22 

100 

-0.65 

-0.62 

-0.33 

annoyance 

0 

45 

100 

-0.58 

0.40 

0.01 

anger 

0 

100 

100 

-0.51 

0.59 

0.25 

interest 

29 

45 

100 

0.64 

0.51 

0.17 

vigilance 

29 

100 

100 

0.49 

0.57 

0.45 


Table 2: Color values of emotions 
according to the Plutchik’s wheel 
and their associations to Mehra- 
bian [21] scores of Pleasure, Arousal 
and Dominance. 



Fig. 5: Color control: the actual color stimulus is generated through a modified 
Ikea lamp, where the RGB LED is controlled by an Arduino UNO board. 


6 Conclusion and further outlooks 

We have discussed how the IB framework provides a parsimonious and princi¬ 
pled account of using a latent affect space to mediate the rechanneling between 
different stimuli. As a final comment, we think it appropriate to remark that the 
general formalism which is expounded here admits a far wider range of appli¬ 
cability than that to which it has been presented in this work. The framework 
could be usefully adopted for current affective computing systems that more and 
more relying on the availability of different sensors (e.g., for monitoring auto¬ 
nomic activity) and brain interfaces |28] . Indeed, such systems are confronted 
with the issue of finding relations in high-dimensional and heterogeneous data 
spaces, one example being data fusion among several others which would emerge 
from the application of this approach to concrete instances. 
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Fig. 6: Experimental results of transconding using the Mood Lamp. 
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